Statistical Learning of Semitic Morphology Using Autosegmental Orthography

نویسنده

  • Paul Rodrigues
چکیده

Abstract The root and pattern system, as well as the system of reduplication, are essential to the morphological analysis of Arabic words. (McCarthy 1979, 1981) Few computational morphology systems have been designed to parse concatenative morphology, as well as roots and reduplication simultaneously, without the help of a dictionary. By using simple statistics, we show an algorithm that can learn both the concatenative morphology as well as the roots and template. This paper shows an approach that is analogous to the the tier-based autosegmental approach developed by Goldsmith (1976), and applied to Semitic languages in McCarthy (1979).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Edge-in association and OCP ‘violations’ in Tigrinya

An important issue in the application of autosegmental principles to Semitic morphology is the way the independent consonantal root is associated to the template provided by the morphology. The three most obvious proposals are from left to right, from right to left, and from the edges in toward the center. I argue that association in the Ethiopian Semitic language Tigrinya is from the edges in,...

متن کامل

Morpho-syntactically Annotated Amharic Treebank

In this paper, we describe an ongoing project of developing a treebank for Amharic. The main objective of developing the treebank is to use it as an input for the development of a parser. Morphologically-rich Languages like Arabic, Amharic and other Semitic languages present challenges to the state-of-art in parsing. In such language morphemes play important functions in both morphology and syn...

متن کامل

Syllable-Based Speech Recognition for Amharic

Amharic is the Semitic language that has the second large number of speakers after Arabic (Hayward and Richard 1999). Its writing system is syllabic with Consonant-Vowel (CV) syllable structure. Amharic orthography has more or less a one to one correspondence with syllabic sounds. We have used this feature of Amharic to develop a CV syllable-based speech recognizer, using Hidden Markov Modeling...

متن کامل

Lex Ical R Epr Esentation of M Ultiw or D Ex Pr Essions in M or Ph Ologically -com Plex Languages

In spite of the surging interest in multiword expressions (M WE s) in recent years, it is still unclear how such expressions should be stored in computational lexicons. This problem is amplified in morphologically-complex languages, where the unique properties of M WE s interact with non-trivial morphological processes. We propose an architecture for lexical representation of M WE s, augmented ...

متن کامل

Identifying Semitic Roots: Machine Learning with Linguistic Constraints

Words in Semitic languages are formed by combining two morphemes: a root and a pattern. The root consists of consonants only, by default three, and the pattern is a combination of vowels and consonants, with non-consecutive “slots” into which the root consonants are inserted. Identifying the root of a given word is an important task, considered to be an essential part of the morphological analy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005